Supplementary Notes 



1 GBR optimization 

In this section we derive an efficient alternating optimization algorithm for the GBR objective 
(Methods). We first describe how to compute argmax g J' GBR (9, q), then describe how this algorithm 
can be used in combination with an EM-like algorithm for learning 9. 

GBR can be employed either as a regularizer for training the parameters or for inference directly. 
In the training case, an EM-like algorithm described is used to compute and output 6, which can 
then be used for inference either with or without GBR. In the inference case, q is computed and 
output as the posterior marginals. In our genomics experiments we trained our models without 
GBR and used GBR for inference only. 



Optimizing q 

The GBR regularizer T^' GBR (6, q) is convex in q; therefore, we could compute q using any convex 
optimization algorithm. However, general-purpose convex optimization algorithm do not scale to 
problems with millions or billions of variables such as those present in genomics. Therefore, we instead 
propose a novel alternating maximization strategy for performing this optimization more efficiently. 

To enable efficient inference, we reformulate J' GBR (9,q) by introducing a new variable r M (Xn). 
Like q, r M is a distribution over Xh, but we require that r M be factorizable as a product of 
marginals — that is r M (xn) = Y[h r ff ( x h)- We define the graph regularizer over r M and add an 
additional term \ R \D(q(XH)\\r M (Xh)), which encourages q and r M to be similar. As we will show 
below, restricting r M in this way means that the reformulated objective is a lower bound on the 
original rather than being equivalent. We will maximize this lower bound as an approximation to 
maximizing the original. The reformulated regularizer is 

m' GBR _ R1 (q,r M ) 4 -\ R1 D(q(X H )\\r M (X H )) + f m (r M ) (1) 
fRi(r M )±-\ G £ w(u,v)D(r M (X u )\\r M (X v )), (2) 

(u,v)£F G br, 

and <^gbr-ri(^' 9' rM ) an d ^gbr-ri(^' <7> rM ) are defined according to Equations (2) and (4) respec- 
tively using the corresponding regularizers. That is, 

maximize e?r M J' GBR _ R1 (9,q,r M ) = C(9) + TZ GBR _ Rl (9,q,r M ), (3) 
K GBR . R1 (9,q,r M ) 4 -D(q(X H )\\pe(X H \xo)) + Vll GBR - R1 (q,r M ). (4) 

First, we show that r M « q for large values of Ari, so optimizing the reformulated regularizer is 
equivalent to optimizing a lower bound on the original. 

Lemma 1. For distributions p G Q and q G V where PnQ/(3 and a continuous function J(p, q), 
let J(p, q; A) = J(p, q) — XD(p\\q), and p* x , q^ £ argmax pg -p j(?g g J(p, q; A). Then the following hold: 

lim D(p* x \\q* x )=0, (5) 

A— ¥oo 

lim \\p* x — q* x \\i = 0 for any i, where || • ||^ is the l-norm, and (6) 

A^oo 

lim max J(p, q; A) < max J(p,p). (7) 
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Proof. Consider any e > 0 and any p' G V,q' £ Q such that D(p'\\q') > e. Let p G argmax pg -p n g J(p,p) 
and consider any A' > (l/e)(J(p'\\q r ) — J{p\\p))- 

J(p',q';X') = J(p',q')-XD(p f \\q f ) (8) 

<J(p',q')-Xe (9) 

< J(p',q') -fi±tfW\W) ~ J(P\\P)) (10) 

= J(p,p) (11) 

Therefore, D(p*\\q*) < e when A > A'. This proves Proposition (5). 
We have that 

D(p\\q)>l\\ P -q\\l>l\\p-q\\j (12) 

(13) 

for any £-norm. The first inequality is Pinsker's inequality and the second follows from the rela- 
tionship of f-norms. Proposition (6) follows from this combined with Proposition (5). 
Due to Proposition (6) and the continuity of J(p, q), 

lim max J(p,q:X) — max J(p,p)=0. (14) 

A^oo P eV,qeQ p&VnQ 

Proposition (7) follows from this and the fact that V H Q C V. □ 

Therefore, for sufficiently large Ari, optimizing Equation (2) is equivalent to optimizing a lower 
bound on Equation (5). This form allows us to compute q efficiently, which is shown as follows. 

Theorem 2. Define q*(Xn) — a.vgmax q J' GBR Rl ($,q,r M ). Then, 

q {XH) Z X ' H Pe(x' H , x 0 ) V(i+A R1 ) ^ r M K) A R1 /(i + A R1 ) • I *) 

Proof. For ease of notation, we group all terms that do not depend on q into one function K2(r). 
Since we must respect the sum-to-one property of q, we form the Lagrangian by adding the term 

L 2 (q, A 2 ) = -D(q(X H )\\pe(X H \Xo)) - X R1 D(q(X H )\\r M (X H )) - A 2 (l - ^(a*)) + K 2 (r) 

(16) 

?(zff)log Ari X 2 (l - }_^q(x H )) + K 2 (r) (17) 

?(zh) log , , 1+Ari logp e (xo) - A 2 (l - 2^ ?(*ff)) + K 2 {r) 

(18) 

0 = 777 — T = " tegp e (x H , x 0 )r M (x H ) XR1 + log g(x H ) 1+AR1 + 1 + Ari - A 2 (19) 



dq(x H ) 



1 _Af, 



q(xn) oc pb{xh, xo) 1+Ari r (xh) 1+Ari (20) 

□ 
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Critically, because r M is factorizable such that each factor involves just one variable X^, q*(Xn) 
obeys the same factorization properties as the unregularized model pe(Xn,xo)- For example, if 
the original model was an HMM, q still factors as a chain. Therefore, the normalization constant 
can be computed using any algorithm for exact or approximate probabilistic inference on factorized 
models, such as belief propagation, with similar computational cost as the unregularized model. 

Optimizing r M 

Despite the last reformulation, the objective still does not admit closed- form updates for r M . 
Therefore, we again reformulate VJZgbr-ri(0, q,r M ) by adding a new variable s M , where s M is 
also a distribution over Xh restricted to be factorizable as a product of marginals. As before, 
we add a term X R2 D(s M (Xh) \\r M (Xh)), which encourages s M ~ r M . We define the graph 
regularizer KL divergence terms to have s M on the left and r M on the right — that is, in the form 
D(s^f (X u )\\r^f (X v )) — which will enable efficient optimization for both variables. 

m' GBR . R2 (q,r M ,s M ) 4 -X R1 D(q(X H )\\r M (XH))+m^f R2 (r M ,s M ) (21) 
fR2(r M ,s M )±-\ R2 D(s M (XH)\\r M (X H ))-\G £ w(u,v)D(s™ (X u )\\r™ (X v )), 

(u,v)eF G BR 

and J' GBR _ R2 (8, 9> rM > sM ) an d ^-'gbr-R2(^' <7> rM ' sM ) are defined according to Equations (2) and (4) 
respectively using the corresponding regularizers. That is, 

maximize ei(?)r M iS M J' GBR . R2 (0, q, r M , s M ) 4 C(6) + TZ' GBR _ R2 (6, q, r M , s M ), (22) 

n' GBR _ R1 (e,q,r M ,s M ) 4 -D(q(XH)\\p e (X H \xo))+VTZ GBR . R2 (q,r M ,s M ). 

(23) 

By Lemma 1 , optimizing 7£gbr-R2 (q) is equivalent to optimizing 7£gbr-ri (q) f° r large values of Ar.2 • 
This regularizer can be optimized in r M and s M using closed-form updates, shown as follows. 

Theorem 3. For notational simplicity, define a new regularization graph with self-edges of weight 
Ar2/Ag, E' gbr = E GBR U{(h,h) | h £ H}, and w'(u,v) = w(u,v) + 5(u = v)X R2 /X G . Let 
r M *(X H ) € argmax r M J GBR _ R2 (9,q, r M , s M ) ands M *(X H ) G argmax s M J GBR . R2 (0,q, r M , s M ). Then, 

I v \X V ) — , 



Ari +\ G 52 {UjV)eE > GBR w'(u,v) 

52(u,v)eE' GBR w'(u,v) logrf (x u ) 
s M *(x ) - ^^'o B n W ' M (25) 

U 1 U> E(„,„ )e ^ BR «»'(«■«) logr^K) ' 

^< GXP E(u,»)gB^ BR w'(u,v) 

Proof. In its current form, VTZ GRR . R2 (q) involves a sum over all values of q(Xn)- However, the 
following lemma shows how the factorizability of r M facilitates expressing the objective in a form 
that involves only sum over values of each variable X^. 

Lemma 4. For distribution p(Xy) and factorizable distribution q M (Xy) = \\ v&v q^f(X v ), define 



D(p\\q M ) = £ D(p(X v )\\q(X v )) - H(p) + £ H(q(X v )). (26) 
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Proof. 



D(p\\q M ) = -H(p) - J>(*y) log [] <tf{X v ) (27) 

x v \v<=V / 

= -Hip) - J>(zy) Y, k>g<tf{X v ) (28) 

= -H{p) - EE E P^logflfW ( 2 9) 

= -lf(p) -EE ( lo g^ M (^)) E (30) 

= -H(p) -EE ( lQ g^ M (^)) Pf(^) (31) 
vev x v 

= E D(p(X v )\\q(X v )) - H{p) + E H(q(X v )) (32) 



□ 



Define qff to be the marginal distribution of q over Xh, qff {Xh) = Q(Xh)- Using 

Lemma 4, 

VKGBBrBziq) = max ( -A R1 E Mlkf TO) + H(q) - E H{<g{X h )) + / R2 (r M ) ) . 

r V he// heH J 

(33) 

We now proceed to derive the update steps. We first derive the update for r M . The Lagrangian 
for the optimization of r^f is 

L 3 -i(rf,A 3 -i) (34) 

= \ m Diq™iX v )\\rM (X v )) + X G E ^(u,v)D( 8 ^(X u )\\r^(X u )) (35) 

(u,v)eE' GBIi 

+ A 3 -i(l - E^(^)) + *s-i(9. A ^) (36) 



o dL 



dr*fix v ) 



f Ari^ M (x,) + A g E ^ ^)^(^) ] -mT^ + A 3-i (37) 

=> rf (x v ) oc A R i 9 f(x w ) + A G E ™'(^>fW (38) 

(«,u)e^ BR 

E (Ari^ M (x,) + A g E (39) 

= AriWM+A g E ™'(^)E^v<) (40) 
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We next derive the update for s M . The Lagrangian for the optimization of s^f is 

L 3 -2(sf,A 3 -2) ( 42 ) 
= A G w\u,v)D(s^(X u )\\r^(X u )) + X 3 . 2 (l-J2^(X u )) + K 3 . 2 (q,r M ,s^ u ) (43) 

(u,v)£E' GBR x u 

= a g e e ^> «) io s St?! + A 3-2(i - e + r M , 

(44) 



° 



= A G E ^ / (^^)log T , M ^ , + A G E w'(u,v) j (1 + logs^(z u )) - A 3 _2 
« [Xu) (u,v)eE> GBR r v W \ (u,v)eE' GBn J 



(45) 

y](,.„\crr' w'(u, v) \ogr!f(x u ) 

■TW - «p T° m ,,(■,, <46) 



□ 

Updating 0 

The preceding section described an algorithm for computing argmax ? 7£gbr-R2 • This algorithm can 
be combined with an EM-like algorithm in order to learn a 0 that (locally) optimizes Jgbr-R2> as 
we describe in this section. We use an alternating EM-like algorithm to compute 0. 

E-step: Compute (q( t+1 \ r u{t+1 \ s M{t+1) ^j £ argmax^M %a u «/ G BR-R2(^ (t) ^> rM > « M ) 
M-step: Compute 0^ +v > G argmax 0 J' GBR (0, g(' +1 )) 
The preceding section showed how to perform the E-step. To compute the M-step, 

argmax J' GBR (0, q (t+1) ) = argmax J B ?(t+ i )(Xif) [\ogp e (X H , x 0 ))\ (47) 

6 6 

The M-step takes the same form as the EM algorithm presented in (Neal and Hinton, 1999). 
The update for 0 depends on the particular factorization and parameterization properties of the 
model. Because the posterior distribution q(Xn) obeys the same factorization properties as the 
unregularized model pq(Xh, Xo), the same closed-form updates for 0 can be used. 

Therefore, Jqbr-R2 can be optimized using a three-way alternating maximization algorithm, 
which proceeds by alternating updates to r and s to convergence, alternating this whole update of 
r/s with updates to q until convergence, then finally alternating updates to q and 0 until convergence. 
A schematic of the algorithm and objective appear in Supplementary Figure 6, and the algorithm is 
shown in full in Algorithm 1. 

Theorem 5. The modified EM algorithm monotonically increases the GBR objective: 

</ G BR-R2 (0 {t) ) < </ G BR-R2 ) • (48) 

Proof. Function q*(-) of Algorithm 1 implements coordinate descent on q, r M and s M . D(p\\q) and 
D(p\\p) are jointly strictly convex in p and q and bounded below by 0. Thus, «/qbr-R2 ^ s bounded 



29 



Algorithm 1 Efficient and scalable algorithm to optimize </gbr-R2 



function r M * (q) 
for h e H do 

<lh{ x h) «- J2 XH ^ h i( x h) (belief propagation) 
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end for 

Initialize r M< "°\ S M ^ arbitrarily, 
ii <- 1 

while not converged do 
for v <G H do 



M (t l)( , ARH, tf (»,)+A C E l „,,) 6 , i}Blt » l (" ,«)« M l"'"(».) 

(u,v)€E GBR ' 



10: end for 

11: for u e if do 



'(«,«) log ^('l- 1 ').,) 



12: S n^») 



,'(u,«) 



V , ran GBR 



end for 

h <- 1! + 1 

end while 

a m(*i) 

return r 
end function 



function q*(0) 
t 2 <- 1 

Initialize r^ 0 "* arbitrarily, 
while not converged do 



23: q y '(xh) < — 777777 rt-, n — ; — 77777 — r belief propagation) 

r M(*2) ^_ r M*^(t 2 )) 
t 2 <- t 2 + 1 

end while 
end function 

Initialize arbitarily. 
h <- 1 

while not converged do 

g(*s) ^^(gCts-l)) 

0(*s) ^_ argmaXfl E q (t 3 -)( XH) [\ogp e (X H ,x 0 ))} (EM update) 

end while 
Output 6>C* 3 > 



below and jointly strictly convex in q, r M and s M . Convergence to the global optimum of «/gbr-R2 
in q, r M and s M follows from its strict convexity (Warga, 1963). 

<J^n-K2(0 {t+1) ,Q {t+1 \r M{t+1 \s M{t+1) ) 
< Jgbr-r 2 (^ { * +1) ) 

The first equality follows from the global optimality of q( t+1 \ r M ^ t+1 ^ an d s M ^ t+1 \ The second 
inequality follows from the fact that is chosen to maximize </gbr-R2- The third inequality 



30 



follows from the fact that «^gbr-R2(^' 9> rM ' s M ) is a lower bound on Jqbr-R2(#)- □ 
Computation 

Probabilistic inference for computing q and 6 was performed on the DBN model using the graphical 
models toolkit (GMTK) (Bilmes and Zweig, 2002). GMTK computations were distributed over a 
cluster using Grid Engine. Alternating minimization for updating r M and s M were performed using 
the Measure Propagation package (Subramanya and Bilmes, 2011). 

2 Graph-based regularization outperforms an alternative approach based on 
approximate inference 

To evaluate GBR's performance relative to related methods, we compared GBR to approximate 
inference on a graphical model with the same dependence structure. We compared to loopy belief 
propagation (LBP) because it is one of the most widely used approximate inference methods. 
While we would have preferred to perform this comparison using real data sets, it appeared that 
even our fastest LBP implementation would take months to converge. Therefore, we instead 
performed this comparison using synthetic data. We generated a chain of length n = 300, with 
(X H ,X 0 ) = (Zi:3oo,*i:30o), where Z 1:30 o G {0, l} n and Y 1:300 G R n . We defined an HMM over this 
chain with transition probabilities Pr(Zj = Zj+i) = 0.8 and emission probabilities Yi ~ N(Zi,a), 
where we vary a to control the difficulty of the problem — higher a results in more challenging 
inference. We generated a graph W € R nxn over the vertices of the chain by setting wij = 1 with 
probability 0.2 if Zi = Zj, Wij = 1 with probability 0.1 if Zi / Zj, and Wij = 0 otherwise. This 
model is meant to simulate the task of labeling a chain (such as a genomic sequence) where we have 
noisy information about which pairs of positions have the same label. 

We compared three methods of inference: 1) inference on the chain alone, without using W, 
2) LBP on the chain plus extra factors of PrpQ = Xj) = sigmoid(Awjj), where A controls the 
strength of these factors; and 3) GBR using the regularization graph W. In order to give LBP 
the greatest advantage possible, for each value of a we varied A from 10 -10 to 1 and picked the 
value that produced the best performance. GBR has the best performance for all experiments, and 
significantly outperforms the other methods for large values of a (Supplementary Figure 7). 

3 S eg way model 

We used graph-based regularization to augment the Segway semi-automated genome annotation 
method (Hoffman et al., 2012). Segway uses a dynamic Bayesian network model to perform genome 
annotation. The model is presented in detail in (Hoffman et al., 2012), but we describe it briefly 
here. 

• We define a latent label variable Yi G {1--^} for each position i £ {1—/V} in the genome, 
where K is the user-specified number of labels and N is the number of positions. 

• We define observed signal data variables Xij representing the value of signal data set j £ 
{1..M} at genomic position i, where M is the number of signal data sets. We downsample the 
genome into bins of size R and average the signal data in each bin (after applying the inverse 
hyperbolic sine transform), so iV w 3 x 10 9 /i?. Because the sequencing depth of existing Hi-C 
data sets is too low to achieve single base pair resolution, we used R = 10000 for experiments 
using GBR to integrate Hi-C data. We used R = 1 for experiments using GBR to transfer 
information between cell types. 
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• The observed data variable Xij depends only on the label at position i, Yij. We model the 
variable Xij as a Gaussian distribution with data set- and label-specific mean parameter mj 
and data set-specific variance parameter Uj. In the case that some data values are missing due 
to mappability, we weight the observation of Xij by the proportion of mappable positions 
Xij in bin i in data set j. 

• The label variable Yj depends only on the label at the previous position Y- L -\. We model the 
label transition from label a to label b using a transition parameter Q a ^- 

• The parameters fJ-i-.K,i:M, &1-.M and Q\-.k,v.k are learned through EM. 
The overall log-likelihood of the Segway model is defined as: 

N M 

log Pi(X, Y\fi,a,Q) = J2J2 lo S N ( X v I MX ' a i) 

i=i j=i 

N-l 

+ Atransition == Y i+1 ) log Q Yt ,Y t ( 49 ) 

i=l 

N-l 

+ ^2 l{Yi + Y i+ i) (Atransition log(l - Qy^Yi) + ^gQ Yi ,Yj) 
i=l 

where fij^ is the mean associated with signal data set j and label I; Gj is the variance associated 
with signal data set j (shared between all labels); Xij is the proportion of mappable positions in bin 
i for data set j; Q a ^ is the transition probability parameter from label a to label b; and Atransition is 
a weight on the transitions relative to the emissions of the model. 

4 Review of existing SAGA methods for using data from multiple cell types 

Existing methods for semi-automated genome annotation work well on data from a single cell 
type, but annotating multiple cell types remains an active area of research. There are three simple 
strategies for performing annotation of multiple cell types. First, the simplest strategy is to apply 
the same model to both genomes (sometimes called "concatenated" annotation) (Sheffield et al., 
2013), but this requires that all cell types have the same set of available data, which is not generally 
true. Moreover, in practice, experimental artifacts lead to poor performance for models which 
model multiple data from multiple experiments with the same parameters, exhibiting effects such 
as assigning separate sets of labels to each cell type in the model. Second, one could perform 
annotation separately on each cell type and find a mapping between the labels (for example, by 
using the Hungarian algorithm (Kuhn, 1955)). However, since different cell types generally have 
different types of activity and different sets of signal data sets, such a mapping is generally very 
poor. Third, one could use all data from all cell types in one model (sometimes called "stacked" 
annotation), but this strategy must either give the same label to each position for every cell type or 
use a separate label for each pattern of labels across cell types, which requires an exponentially-large 
number of labels. 

Two additional methods have been proposed to annotate multiple cell types. The first, called 
hiHMM ("hierarchically-linked infinite HMM") maintains a separate model for each cell type and 
uses a regularization penalty to encourage the models to have similar parameters (Ho et al., 2014). 
This addresses the problem of requiring the same set of data across cell types, but does not share any 
position-specific information between cell types. The second method for performing multi-cell-line 
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annotation, called TreeHMM, is given a tree over cell types and models the transition between 
labels between neighboring positions and also between neighboring developmental states (Biesinger 
et al., 2013). This model can integrate position-specific information between cell types, but requires 
that each cell type has the same available data and is sensitive to any cross-experiment artifacts. 
Moreover, the complexity of this model forced the authors to resort to approximate methods for 
inference, which likely decreases the quality of the resulting annotations. 

These two problems — requiring a common set of data and failure to integrate evidence — are 
especially important because, although there are virtually limitless cell types and cell states that 
one would like to understand, very limited numbers of experiments have been performed in most of 
these cell types due to the cost of genomic experiments. For example, ENCODE has performed 335 
experiments in its most-studied cell type, but has performed just 2-10 experiments in more than 
100 cell types. 

Transferring information with GBR removes the requirement for a common set of data across 
cell types and does integrate position-specific evidence across cell types. Therefore, GBR provides a 
method leveraging all available data in order to produce high-quality annotations of each cell type. 

5 Related optimization methods 

Clearly, the most straightforward way to express pairwise interactions in a graphical model is to 
encode them in the underlying graph and to use approximate methods (reviewed in (Wainwright 
and Jordan, 2008)) to enable inference. This form of interaction is quite general, in that when one 
adds a factor <f>(yi,yj) between two random variables Yi and Yj, these random variables may have 
any type of interaction, expressed by 4>(yi,yj). GBR, on the other hand, asks only for similarity 
between the marginals, meaning that p(yi\ ) and p(yj\) should be similar. Alternatively, a factor 
could encode such similarity, for example if 4>(yi, yj) = Xl(y% = yj)- Such factors added to an HMM 
or CRF would result in a high treewidth model that can be dealt with using approximate inference. 
Doing so, however, loses any guarantee of optimality (which we preserve with GBR). 

The posterior regularization framework of Ganchev et al. (Ganchev et al., 2010) takes an approach 
similar to ours, augmenting a simple model in a way that maintains tractable inference. This 
method adds a regularization term to an EM objective in order to require the posterior probabilities 
to satisfy logical constraints in expectation. Ganchev et al. show how to optimize this combined 
objective efficiently when the regularization term is linear in the posterior distribution of the model. 
Unfortunately, pairwise similarity relationships cannot be expressed with such a linear regularization 
term. 

The most similar work to ours are the following three methods for graph regularization. First, 
Altun el al. (Altun et al., 2005) describe a graph regularization applied to a max- margin model applied 
to pitch-accent prediction and optical character recognition. However, this method involves a matrix 
inversion step, and thus cannot scale to large models. Second, Subramanya et al. (Subramanya et al., 

2010) combine a temporal CRF with a regularizer that expresses pairwise squared-error penalties 
derived from unlabeled data. They apply this method to the part-of-speech tagging task (Subramanya 
et al., 2010) and later to related problems in natural language (Das and Petrov, 2011; Das and Smith, 

2011) . That work, however, resorts to a purely heuristic update step, and lacks any optimality 
guarantees. Third, He et al. (He et al, 2013) present an approach based on an exponentiated gradient 
descent algorithm. Like our approach, He's approach exhibits monotone convergence. Although 
He's work has many similarities with our approach, our methods were developed independently, 
and He's work differs from ours in three important ways. First, He et al. (He et al., 2013) use 
an exponentiated gradient descent strategy, while we use alternating minimization. Second, He's 
method uses a squared-error penalty, which is inappropriate for probability distributions, unlike our 
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use of the Kullback-Leibler divergence (Bishop, 1995, p. 226). Third, the exponentiated gradient 
descent method is applied to handwriting recognition and part-of-speech tagging, while we apply 
GBR to genome annotation. 
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Supplementary Figures/Tables 



Data type URL 
CAGE http:/ /www. gcncodcgcncs.org/rclcascs/7. html 

ENCODE (ChlP-seq, DNase, Replication timing) http://hgdownload.cse.ucsc.edu/goldenPath/hgl9/cncodcDCC/ 
ChlP-scq (Roadmap) http://www.roadmapcpigcnomics.org/data 
Hi-C http:/ /yuclab.org/hi-c/download.html 



Supplementary Table 1: Data sources 
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Input data sets 



IMR.90 domain annotation 
DNase 
H2aK5ac 
H2aK9ac 
H2a.Z 
H2bK120ac 
H2bK12ac 
H2bK15ac 
H2bK20ac 
H2bK5ac 
H3K14ac 
H3K18ac 
H3K23ac 
H3K27ac 
H3K27me3 
H3K36me3 
H3K4ac 
H3K4mcl 
H3K4mc2 
H3K4mc3 
H3K56ac 
H3K79mol 
H3K79mc2 H3K9ac 
H3K9mcl 
H3K9mc3 
H4K20mol 
H4K5ac 
H4K8ac 
H4K91ac 
Rcpli-scq 



Eight-cell type domain annotation GM12878 reduced annotation 



DNase 
H2a.Z 
H3K27ac H3K27me3 
H3K36me3 
H3K4mcl 
H3K4mc2 
H3K4mc3 
H3K79me2 
H3K9ac 
H3K9mc3 
H4K20mel 



H3K4mel 
H3K4me2 
H3K3me3 
H3K9ac 
H3K27ac 
H3K27me3 
H3K36me3 
H4K20mel 



Number of labels 
Transition weight (A trans j tlon ) 
Number of random EM initializations 
GBR graph scale (Aq) 
GBR. optimization hypcrparameter (A^t) 



10 

1 
1 



12 
10 

1 
1 



25 
1 

10 
1 

10 



Supplementary Table 2: Parameters of all genome annotations 
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GOID 


Bonfcrroni-corrcctcd p-valuc 


Name 


GO 


: 0008150 


5.44809636039661e-49 


biological process 


GO 


:0070647 


5.36690477736568e-14 


protein modification by small protein conjugation or removal 


GO 


:0016567 


2.53980288482968e-13 


protein ubicjuitination 


GO 


:0032446 


3.48958938873894e-13 


protein modification by small protein conjugation 


GO 


:0071840 


1.28 1684627719 lc-07 


cellular component organization or biogenesis 


GO 


:0048522 


3.15663523300463c-07 


positive regulation of cellular process 


GO 


:0016043 


1. 550137942 12494c-06 


cellular comp on out organization 


GO 


:0050953 


2.07559208909527e-06 


sensory perception of light stimulus 


GO 




2. 79978686922 173e-06 


visual perception 


GO 


:0006996 


2. 183965 15274592e-05 


organelle organiz at ion 


GO 


:0016071 


7.40708736771575e-05 


mRNA metabolic process 


GO 


:0048519 


0.000123587368672724 


negative regulation of biological process 


GO 


:0023056 


0.000191430490530164 


positive regulation of signaling 


GO 


:0048518 


0.000239550248137195 


positive regulation of biological process 


GO 


:0032270 


0.000250487526217915 


positive regulation of cellular protein metabolic process 


GO 


:0018146 


0.000336577942863192 


kcratan sulfate biosynthctic process 


GO 


; 0007005 


0.000402299808664309 


mitochondrion organization 


GO 


:0010647 


0.000414643467676553 


positive regulation of cell communication 


GO 


:0032268 


0.000530771233104358 


regulation of cellular protein metabolic process 


GO 


:0043928 


0.000623789428174407 


cxonuclcolytic nuclear-transcribed mRNA catabolic process involved in dcadcnylation-dcpcndcnt decay 


GO 


: 0000288 


0.000670978259119087 


nuclear- transcribed mR.NA catabolic process, dcadcnylation-dcpcndcnt decay 


GO 


:1903320 


0.00117566239446111 


regulation of protein modification by small protein conjugation or removal 


GO 


:0009967 


0.00131477493050438 


positive regulation of signal transduction 


GO 


:1902533 


0.00136644459633516 


positive regulation of intracellular signal transduction 


GO 


:0048523 


0.00148158842395948 


negative regulation of cellular process 


GO 


:0051340 


0.00150521495507962 


regulation oi lignsc activity 


GO 




0.00193533449146964 


nuclear-transcribed mR.N A catabolic pro cess, cxonuclcolytic 


GO 


, UUu Af±U i. 


0.00194625332376889 


positive regulation of protein modification process 


GO 


:1903047 


0.00215993757988771 


mitotic cell cycle process 


GO 


:0042531 


0.00237481842965934 


positive regulation of tyrosine phosphorylation of STAT protein 


GO 


:0007267 


0.0039286364798486 


cell-cell signaling 


GO 


:0006401 


0.00399526165008678 


R.NA catabolic process 


GO 


:0031396 


0.00443702013802013 


regulation of protein ubiquitination 


GO 


:0000278 


0.00470584872871359 


mitotic cell cycle 


GO 


:0051351 


0.00495154943903681 


positive regulation of ligasc activity 


GO 


:0051438 


0.00524041402868326 


regulation of ubiquitin-protcin transferase activity 


GO 


:0042339 


0.0053487952592411 


kcratan sulfate metabolic process 


GO 


:0051247 


0.0056999635780996 


positive regulation of protein metabolic process 


GO 


:0016265 


0.00629144608895318 


death 


GO 


:0046427 


0.00671456368410498 


positive regulation of JAK-STAT cascade 


GO 


:0008219 


0.0067422659494943 


cell death 


GO 


:0000956 


0.00698342075165014 


nuclear-transcribed mRNA catabolic process 


GO 


:0031399 


0.00786100228743956 


regulation of protein modification process 


GO 


: 0044770 


0.00795426575038111 


cell cycle phase transition 


GO 


:0051246 


0.00937868333257509 


regulation of protein metabolic process 



Supplementary Table 3: GO terms enriched for genes in BRD domains 
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GOID 


Bonfcrroni-corrcctcd p-valuc 


Name 


GO 


:0001944 


2. 9592753850599c- 11 


vasculature development 


GO 


0001568 


1 .34928597594565e-09 


blood vessel development 


GO 


:0072358 


1 .87785996344277e-09 


cardiovascular system development 


GO 


:0072359 


1. 9486572803 1881e-09 


circulatory system development 


GO 


:0048598 


2.33761516882996e-09 


embryonic morphogenesis 


GO 


:0032501 


4.29504418805107e-09 


multicellular organismal process 


GO 


:0044707 


6.45108090271442e-09 


singlc-multiccllular organism process 


GO 


:0007275 


9.54873387980616c-09 


multicellular organismal development 


GO 


:0032502 


9.87403900226897e-09 


developmental process 


GO 


0009653 


1.1 39005761 19261c-08 


anatomical structure morphogenesis 


GO 


:0048646 


1.27560082632202e-08 


anatomical structure formation involved in morphogenesis 


GO 


:0048856 


3.44027824649974c-08 


anatomical structure development 


GO 


0048514 


3. 643790568 131 18e-08 


blood vessel morphogenesis 


GO 


:0044767 


4.961888382 14423c-08 


single-organism developmental process 


GO 


:0008150 


1.09920334986419e-07 


biological process 


GO 


:0009888 


1.15062070196739e-07 


tissue development 


GO 


:0048731 


1.34005756992352c-07 


system development 


GO 


:0030198 


2.54744657740361c-07 


extracellular matrix organization 


GO 


:0043062 


2. 70008497648 116c-07 


extracellular structure organization 


GO 


:0040011 


4.0968165755 1741e-07 


locomotion 


GO 


:0048523 


6.00924738320409c-07 


negative regulation of cellular process 


GO 


:0048513 


1. 275556848 17328c-06 


organ development 


GO 


:0009887 


1.67228271838845e-06 


organ morphogenesis 


GO 


:0060429 


1.75248481849761e-06 


epithelium development 


GO 


:0009605 


2.1 238836829 1444e-06 


response to external stimulus 


GO 


:0048519 


2.70812717394184e-06 


negative regulation of biological process 


GO 


0048869 


3.99639769208689c-06 


cellular developmental process 


GO 


:0009790 


4. 814208 1410924e-06 


embryo development 


GO 


0030154 


8.17799427787657e-06 


cell differentiation 


GO 


:0048522 


8.79037004023573e-06 


positive regulation of cellular process 


GO 


:0048870 


9.70748540194382e-06 


cell motility 


GO 


:0051674 


9. 70748540 194382e-06 


localization of cell 


GO 


:0048518 


9.8762526690946c-06 


positive regulation of biological process 


GO 


:1902533 


1.75047286259699e-05 


positive regulation of intracellular signal transduction 


GO 


:0071840 


2.37299292646679e-05 


cellular component organization or biogenesis 


GO 


:0030334 


2. 3884441 1436045c-05 


regulation of cell migration 


GO 


:0007389 


3. 4105764538 1657e-05 


pattern specification process 


GO 


:0051270 


4.15793772024666c-05 


regulation of cellular component movement 


GO 


:2000145 


7.20700858370221e-05 


regulation of cell motility 


GO 


0040012 


7. 2398136 159996c-05 


regulation of locomotion 


GO 


:0016043 


7. 577781 8241 9685e-05 


cellular component organization 


GO 


:0001501 


9.16531463777006e-05 


skeletal system development 


GO 


:0008219 


0.000103725100284498 


cell death 


GO 


:0001525 


0.0001112117619674 


angiogencsis 


GO 


:0009966 


0.000112414145595174 


regulation of signal transduction 


GO 


0016265 


0.000115956206339428 


death 


GO 


:0009967 


0.000119265054523972 


positive regulation of signal transduction 


GO 


:0016477 


0.000141195508435494 


cell migration 


GO 


:0048568 


0.000148183903115669 


embryonic organ development 


GO 


:0001503 


0.000148474000627697 


ossification 


GO 


0006915 


0.000174710834263385 


apoptotic process 


GO 


:0048729 


0.000185804853094102 


tissue morphogenesis 


GO 


0012501 


0.000193299675551627 


programmed cell death 


GO 


:0010647 


0.000264669391358318 


positive regulation of cell communication 


GO 


:0003007 


0.000308615021011459 


heart morphogenesis 


GO 


:0010628 


0.000312006014076936 


positive regulation of gene expression 


GO 


0023056 


0.000458831197316147 


positive regulation of signaling 


GO 


:0051239 


0.000694243253157008 


regulation of multicellular organismal process 


GO 


0031325 


0.000823302329284296 


positive regulation of cellular metabolic process 


GO 


:0002009 


0.000843967242091738 


morphogenesis of an epithelium 


GO 


:0048863 


0.000980998073137653 


stem cell differentiation 


GO 


:0042325 


0.00125849445147922 


regulation of phosphorylation 


GO 


:1902531 


0.00144207328720399 


regulation of intracellular signal transduction 


GO 


:0044236 


0.00151180971326667 


multicellular organismal metabolic process 


GO 


:0009893 


0.00161958399151636 


positive regulation of metabolic process 


GO 


:0022603 


0.00169126335423007 


regulation of anatomical structure morphogenesis 


GO 


:0035295 


0.00223230684414793 


tube development 


GO 


:0006928 


0.00230989986294592 


cellular component movement 


GO 


:0010604 


0.00236857824547227 


positive regulation of macromolcculc metabolic process 


GO 


:0010646 


0.00252308440283661 


regulation of cell communication 


GO 


:0050793 


0.0026847817189637 


regulation of developmental process 


GO 


:0048562 


0.00285877724714378 


embryonic organ morphogenesis 


GO 


:0032879 


0.00295955595643228 


regulation of localization 


GO 


:0080134 


0.00335551717091664 


regulation of response to stress 


GO 


:0048705 


0.00370439844279299 


skeletal system morphogenesis 


GO 


:0048864 


0.00389984096973906 


stem cell development 


GO 


:0023051 


0.0040158774598731 


regulation of signaling 


GO 


0006334 


0.00406223628626311 


nuclcosomc assembly 


GO 


:0045893 


0.00414091776655817 


positive regulation of transcription, DNA-tcmplatcd 


GO 


0007167 


0.00519580828357705 


enzyme linked receptor protein signaling pathway 


GO 


:0003002 


0.00540223999981726 


regionalization 


GO 


:0051254 


0.00549153180374605 


positive regulation of RNA metabolic process 


GO 


:1902680 


0.00598393700399004 


positive regulation of RNA biosynthctic process 


GO 


:0023014 


0.00676138544251521 


signal transduction by phosphorylation 


GO 


:0008283 


0.0070310418956878 


cell proliferation 


GO 


:0048583 


0.0072096008049822 


regulation of response to stimulus 


GO 




0.00732272231577271 


chromatin assembly or disassembly 


GO 


:0007507 


U.UU I OOU t OZZ i 


heart development 


GO 


:0042127 


0.00771299785801664 


regulation of cell proliferation 


GO 


:2000026 


0.008432344878453 


regulation of multicellular organismal development 


GO 


:0010941 


0.00916678907171083 


regulation of cell death 


GO 


:0043408 


0.00935020093242137 


regulation of MAPK cascade 


GO 


:0044259 


0.00998773914798186 


multicellular organismal macromolcculc metabolic process 



Supplementary Table 4: GO terms enriched for genes in SPC domains. 
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ut> le+ub z 5 

Distance (bp) 8 ° 



A B 

Supplementary Figure 1: (A) Average squared difference between replication timing values at left and 
right sides of significant contacts, as a function of genomic distance, relative to a permutation control 
(i-test 95% confidence interval grey error regions). (B) Confusion matrix of Segway annotation 
labels at left and right sides of significant contacts (without GBR). Color depicts log 2 (obs/expected) 
relative to a permutation control (Methods) . Pairs of annotation labels at significantly interacting 
positions match more often than expected by chance (binomial test p < 10~ 16 ). 



39 
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log(IMR90 interaction p-value) 

Supplementary Figure 2: Correlation of Hi-C contact strength between IMR90 and Hl-hESC. X 
and Y axes are log p-values of association of a given pair of positions. Color indicates density of 
points. Black lines indicate density contours in 0.1% bins. Spearman r = 0.57. 
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Cell type 



Supplementary Figure 3: Distribution of domain labels across eight cell types. Y axis indicates 
log2 (bases covered by label t in celltype A / (bases covered by label t / 8)). 
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Supplementary Figure 4: Visualization of GO term enrichment for genes in IMR90 (A) BRD 
domains and (B) SPC domains using REVIGO (Supek et al., 2011). Each bubble represents a 
cluster of related enriched GO terms. X and Y axes are projected semantic axes defined using 
multidimensional scaling on the semantic similarity of each pair of terms. 
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Supplementary Figure 5: Enrichment of consistent Segway boundaries for consistent replication 
domain boundaries. (A) Fraction of consistent replication domain boundaries overlapping consistent 
Segway domain boundaries as a function of the overlap distance. (B) Same as (A), but fraction of 
Segway domain boundaries. Replication domain boundaries were called by BD Pope, T Ryba, V 
Dileep, F Yue, W Wu, O Denas, DL Vera, Y Wang, RS Hansen, TK Canfield et al. (unpublished). 
We defined replication boundaries occurring in more than 10 out of 18 cell types as consistent. 
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Supplementary Figure 6: Schematic of the three formulations of the objective and the alternating 
maximization strategy. Edges in this figure indicate KL terms, labeled according to their weight in 
the objective. Boxed formulae are update steps. We perform two reformulations, first splitting q 
into q and r M linked by a KL term of weight Ari, then splitting r M into r M and s M , linked by a 
KL term of weight Ar,2- 
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Supplementary Figure 7: Comparison between inference on chain model without pairwise prior, 
loopy belief propagation (LBP) and GBR. The Y axis shows label prediction accuracy, and the X 
axis shows the parameter cr, which controls the difficulty of inference. 
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